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O Abstract 

Q We consider the estimation of the slope function in functional linear regression, where 

2^ scalar responses are modeled in dependence of random functions. Cardot and Johannes 

[2010] have shown that a thresholded projection estimator can attain up to a constant 
minimax-rates of convergence in a general framework which allows to cover the prediction 
problem with respect to the mean squared prediction error as well as the estimation of the 
^ ^ ' slope function and its derivatives. This estimation procedure, however, requires an optimal 

choice of a tuning parameter with regard to certain characteristics of the slope function 
^ and the covariance operator associated with the functional regressor. As this information 

is usually inaccessible in practice, we investigate a fully data-driven choice of the tuning 
^ parameter which combines model selection and Lepski's method. It is inspired by the recent 

' work of Goldenshluger and Lepski [2011]. The tuning parameter is selected as minimizer 

of a stochastic penalized contrast function imitating Lepski's method among a random 
collection of admissible values. This choice of the tuning parameter depends only on the 
data and wc show that within the general framework the resulting data-driven thresholded 
projection estimator can attain minimax-rates up to a constant over a variety of classes of 
slope functions and covariance operators. The results are illustrated considering different 
configurations which cover in particular the prediction problem as well as the estimation 
of the slope and its derivatives. 
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1 Introduction 



In functional linear regression the dependence of a real-valued response Y on the variation of a 
random function X is studied. Typically the functional regressor X is assumed to be square- 
integrable or more generally to take its values in a separable Hilbert space H with inner product 
(•,-)h and norm ||-||h- Furthermore, we suppose that Y and X are centered, which simplifies 
the notations and that the dependence between Y and X is linear in the sense that 

Y = {f3,X)u + '7e, (T>0, (1.1) 

for some slope function 13 £ M and error term e with mean zero and variance one. Assuming 
an independent and identically distributed (iid.) sample of {Y,X), the objective of this paper 
is the construction of a fully data driven estimation procedure of the slope function /? which 
still can attain minimax-optimal rates of convergence. 

Functional linear models have become very important in a diverse range of disciplines, includ- 
ing medicine, linguistics, chemometrics as well as econometrics (see for instance Ramsay and 
Silverman [2005] and Ferraty and Vieu [2006], for several case studies, or more specific, Forni 
and Reichlin [1998] and Preda and Saporta [2005] for applications in economics). The main 
class of estimation procedures of the slope function studied in the statistical literature are based 
on principal components regression (see e.g. Bosq [2000], Frank and Friedman [1993], Cardot 
et al. [1999], Cardot et al. [2007] or Miiller and Stadtmiiller [2005] in the context of general- 
ized linear models). The second important class of estimators relies on minimizing a penalized 
least squares criterion which can be seen as generalization of the ridge regression (c.f. Marx 
and Filers [1999] and Cardot et al. [2003]). More recently an estimator based on dimension 
reduction and threshold techniques has been proposed by Cardot and Johannes [2010] which 
borrows ideas from the inverse problems community (Efromovich and Koltchinskii [2001[ and 
Hoffmann and Reifi [2008[). It is worth noting that all the proposed estimation procedures rely 
on the choice of at least one tuning parameter, which in turn, crucially influences the attainable 
accuracy of the constructed estimator. 

It has been shown, for example in Cardot and Johannes [2010], that the attainable accuracy 
of an estimator of the slope /3 is essentially determined by a priori conditions imposed on both 
the slope function and the covariance operator F associated to the random function X (defined 
below). These conditions are usually captured by suitably chosen classes J- gM and Q of slope 
functions and covariance operators respectively. Typically, the class T characterizes the level 
of smoothness of the slope function, while the class Q specifies the decay of the sequence of 
eigenvalues of F. For example. Hall and Horowitz [2007] and Crambes et al. [2009[ consider 
differentiable slope functions and a polynomial decay of the eigenvalues of F. Furthermore, 
given a weighted norm \\-\\^^ and the completion Toj of H with respect to |H|^j we shall measure 
the performance of an estimator ^ of ^ by its maximal J\^,-risk over a class ^ C oi slope 
functions and a class Q of covariance operators, that is 

Il^0;T,g] :=supsupE||^-/3||2. 
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This general framework with appropriate choice of the weighted norm ||-||^ allows us to cover 
the prediction problem with respect to the mean squared prediction error (see e.g. Cardot et al. 
[2003] or Crambes et al. [2009]) and the estimation not only of the slope function (see e.g. Hall 
and Horowitz [2007]) but also of its derivatives. For a detailed discussion, we refer to Cardot 
and Johannes [2010]. Having these applications in mind the additional condition T C 
only means that the estimation of a derivative of the slope function necessitates its existence. 
Assuming an iid. sample of iy^X^ of size n obeying model (1.1) Cardot and Johannes [2010] 
have derived a lower bound of the maximal weighted risk, that is 

for some finite positive constant C where the infimum is taken over all possible estimators fi. 
Moreover, they have shown that a thresholded projection estimator in dependence of an 
optimally chosen tuning parameter m* G N can attain this lower bound up to a constant C > 0, 

for a variety of classes T and Q. In other words, T ^ Q\ is the minimax rate of convergence 

and /3m* is minimax-optimal. The optimal choice m* of the tuning parameter, however, follows 
from a classical squared-bias-variance compromise and requires an a-priori knowledge about the 
classes T and which is usually inaccessible in practice. 

In this paper we propose a fully data driven method to select a tuning parameter m in such a 
way that the resulting data-driven estimator can still attain the minimax-rate ItJ\n\T 
up to a constant over a variety of classes T and Q. It is interesting to note that, considering 
a linear regression model with infinitely many regressors, Goldenshluger and Tsybakov [2001, 
2003] propose an optimal data-driven prediction procedure allowing sharp oracle inequalities. 
However, a straightforward application of their results is not obvious to us since they assume 
a priori standardised regressors, which in turn, in functional linear regression necessitates the 
covariance operator F to be fully known in advance. In contrast, given a jointly normally 
distributed regressor and error term, Verzelen [2010] establishes sharp oracle inequalities for 
the prediction problem in case the covariance operator is not known in advance. Although, it 
is worth noting that considering the mean prediction error as risk eliminates the ill-posedness 
of the underlying problem, which in turn leads to faster minimax rates of convergences of the 
prediction error than, for examples, of the mean integrated squared error. On the other hand 
covering both of these two risks within the general framework discussed above Comte and 
Johannes [2010] consider functional linear regression with circular functional regressor which 
results in a partial knowledge of the associated covariance operator, i.e. its cigenfunctions 
are known in advance but the eigenvalues have to be estimated. In this situation Comte and 
Johannes [2010] have applied successfully a model selection approach which is inspired by the 
work of Barron et al. [1999] now extensively discussed in Massart [2007]. In the circular case, it 
is possible to develop the unknown slope function in the eigenbasis of the covariance operator, 
which in turn, allows to derive an orthogonal series estimator in dependence of a dimension 
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parameter. This dimension parameter has been chosen fully data driven by a model selection 
approach and it is shown that the resulting data-driven orthogonal series estimator can attain 
minimax-optimal rates of convergence up to a constant. Although, the proof crucially relies on 
the possibility to write the orthogonal scries estimator as a minimizer of a contrast. 
In this paper wc do not impose an a priori knowledge of the eigenbasis and, hence the orthogonal 
series estimator is no more accessible to us. Instead, wc consider the thrcsholdcd projection 
estimator I3m as presented in Cardot and Johannes [2010] which we did not succeed to write 
as a minimizer of a contrast. Therefore, our selection method combines model selection and 
Lepski's method (c.f. Lepski [1990] and its recent review in Mathe [2006]) which is inspired by 
a bandwidth selection method in kernel density estimation proposed recently by Goldenshluger 
and Lepski [2011]. Selecting the dimension parameter rh as minimizer of a stochastic penalized 
contrast function imitating Lepski's method among a random collection of admissible values 
we show that the fully data-driven estimator can attain the minimax-rate up to a constant 
C > 0, that is 

Ru^0fh;J',Q]^C-R^[n;T,g] (1.2) 

for a variety of classes J-" and Q. We shall emphasize that the proposed estimator can attain 
minimax-optimal rates without specifying in advance neither that the slope function belongs 
to a class of differentiable or analytic functions nor that the decay of the eigenvalues is poly- 
nomial or exponential. The only price for this flexibility is in term of the constant C which is 
asymptotically not equal to one, i.e. the oracle inequality (1.2) is not sharp. 
The paper is organized as follows: in Section 2 we briefly introduce the thresholded projection 
estimator as proposed in Cardot and Johannes [2010]. We present the data driven method 
to select the tuning parameter and prove a first upper risk-bound for the fully data-driven esti- 
mator /3ff^ which emphasizes the key arguments. In section 3 wc review the available minimax 
theory as presented in Cardot and Johannes [2010]. Within this general framework we derive 
upper risk-bounds for the fully-data driven estimator imposing additional assumptions on the 
distribution of the functional regressor X and the error term e. Namely, we suppose first that X 
and e are Gaussian random variables and second that they satisfy certain moment conditions. 
In both cases the proof of the upper risk-bound employs the key arguments given in Section 
2, while more technical aspects are deferred to the appendix. The results in this paper are 
illustrated considering different configurations of classes T and Q. We recall the minimax-rates 
in this situations and show that up to a constant these rates are attained by the fully-data 
driven estimator. 

2 Methodology. 

Consider the functional linear model (1.1) where the random function X and the error term e 
are independent. Let the centered random function X, i.e., = for all /i G H, have a 

finite second moment, i.e., E||X||g < oo. Multiplying both sides in (1.1) by {X, /i)h and taking 
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the expectation leads to the normal equation 



(5, h)u := E[Y{X, h)u] = E[(/3, X)u{X, h)u] =: (r^, Mh, for all heU, 



(2.1) 



where g belongs to H and T denotes the covariance operator associated to the random function 
X. Throughout the paper we shall assume that there exists a solution /3 G H of equation (2.1) 
and that the covariance operator F is strictly positive definite which ensures the identifiability 
of the slope function /3 (c.f. Cardot et al. [2003]). However, due to the finite second moment 
of X the associated covariance operator T has a finite trace, i.e. it is nuclear. Thereby, solving 
equation (2.1) to reconstruct the slope function /3 is an ill-posed inverse problem with the 
additional difficulty that F is unknown and has to be estimated (for a detailed discussion of 
ill-posed inverse problems in general we refer to Engl et al. [2000]). 

2.1 Thresholded projection estimator 

In this paper, we follow Cardot and Johannes [2010] and consider a linear Galerkin approach 
to derive an estimator of the slope function p. Here and subsequently, let G N} be a pre- 

specified orthornormal basis in H which in general does not correspond to the eigenbasis of the 
operator F defined in (2.1). With respect to this basis, we consider for all /t G H the development 
h = Yl'jLi[h]j'^j where the sequence {[h]j)j^i with generic elements [h]j := {h,tpj)^ is square- 
summable, i.e., \\h\\^ = Ylj^il^]] < We will refer to any sequence (a„)„gN ^ ^ whole 
by omitting its index as for example in «the sequence a». Furthermore, given m G N denote 
[h]m '■= ([^]i7 • • • ) [^']m)* (where x* denotes the transpose of x) and let Mm be the subspace of 
H spanned by the first m basis functions {ijji, . . . , ipm}- Obviously, if /i G Mm then the norm of 
h equals the euclidean norm of its coefficient vector [h]m, i-e., \\h\\^ = ([^]m[^]m)^''^ =: ||[^]m|| 
with a slight abuse of notations. An element P™' G Mm is called a Galerkin solution of equation 



Since the covariance operator F is strictly positive definite, it follows that the (m x m)- 
dimensional covariance matrix [F]m, := E([X]m,[-^]m) associated with the m-dimensional ran- 
dom vector [X]m is strictly positive definite too. Consequently, the Galerkin solution G 
is uniquely determined by [Z?"*],?! = [r]^^[5]m and [/3"*]j = for all j > m. However, the 
Galerkin solution does generally not correspond to the orthogonal projection of the slope func- 
tion onto the subspace Mm- Moreover, let {biasm)m^i denote a sequence of approximation errors 
given by biasm '■= ^^Pk^mWP^ ~ I^Wcoi m ^ 1. It is important to note that in general without fur- 
ther assumptions the sequence bias does not converge to zero. Here and subsequently, however, 
we restrict ourselves to classes and Q of slope functions and covariance operators respectively 
which ensure this convergence. Obviously, this is a minimal regularity condition for us since 
we aim to estimate the Galerkin solution. Assuming a sample {(Yi, Xi)}^''^-^ of {Y,X) of size n. 



(2.1), if 



g-rp^\\M<:\\g-mM, V/3GM, 



(2.2) 
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it is natural to consider the estimators 

g:=-y^YiXi, and P := - X^h^^ 
1=1 1=1 

for g and F respectively. Moreover, let [T]m '■= ^Y17=i[-^i\m.[^i]m ^e the empirical (m X 
m)-dimensional covariance matrix and note that [g]m = n'^i^=i'Yi[^i]m- Replacing in (2.2) 
the unknown quantities by their empirical counterparts let /3'^ G be a Galerkin solution 
satisfying 

||?-r^"^||H^ ||?-r/3||H, V/3gm„. 

Observe that there exists always a solution /J™', but it might not be unique. Obviously, if [F]^ 
is non singular then [/3™]m = [r]^^[5]m- We shall emphasize the multiplication with the inverse 
of the random matrix [Fj^, which may result in an unstable estimator even in case [F]j^ is well 
conditioned. Let l{||[r]-i|| <n} denote the indicator function which takes the value one if [F],^ 
is non-singular with spectral norm ||[F]~-'-||s := suppj|^xll[r']m''"'^ll inverse bounded by n, 

and the value zero otherwise. The estimator /3m of P proposed by Cardot and Johannes [2010] 
consists in thresholding the estimated Galerkin solution, that is, 

:=^™l{||[r]-i||,^„}- (2-3) 

In the next paragraph we introduce a data-driven method to select the dimension parameter 
m G N. 

2.2 Data-driven selection of the dimension parameter 

Our selection method combines model selection (c.f. Barron et al. [1999] and its discussion in 
Massart [2007]) and Lcpski's method (c.f. Lepski [1990]) borrowing ideas from Goldenshluger 
and Lepski [2011]. We select the dimension parameter as minimizer of a penalized contrast 
function depending on the weighted norm || • ||^ which we formalize next. Let {ujj)j^i be a strictly 
positive sequence of weights. We define for h £M the weighted norm by ||/i||^ := Z]jt=i 
Furthermore, for m ^ 1, [Vw]m and [Id]„j denotes respectively the m-dimensional diagonal 
matrix with diagonal entries {ujjji^j^m and the identity matrix where for all h 6 Mm we have 
= [h]m[Vuj]m[h]m = || [Vu;]m^ WmP- Given a Sequence K := {[K]k)k^i of matrices, denote 

by 



AmiK) := max ||[Vjf [K]^ Mvjf |U and 

l^K^m — — — 

A (K\-mA (K\ log(A^(K)V(m + 2)) 

Take as an example, AJ^ := Am{K) with K = ([Id]^)m^i which satisfies = maxi^j^^^ w^. 
For n ^ 1, set M!^ := max {l ^ m ^ [n^/^J : A^ ^ n}. The dimension parameter is selected 
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among a collection of admissible values {1, . . . , M} with random integer M given by 

M := min |2 ^ m ^ : m A- \\[f]^'\U > jq:^} " 1, (2-5) 

where we set M„ := if the min runs over an empty set and [aj denotes as usual the integer 
part of a. Furthermore we define a stochastic sequence of penalties iP^m)i<m<M ^^^'^^ takes 
its inspiration from Comte and Johannes [2010]. Let Sm '■= Sm{K) with K = ([r]^)^^i and 

1 " ^ 

:= 14 « 5^ with := 2{-Y,Y^ + [9]U^&]m) (2.6) 

1=1 

where k is a positive constant to be chosen below. The random integer M and the stochastic 
penalties {p^m)i<m<M used to define the sequence (^m)i<^<jQ' of contrast by 

^rn-= max_|||3fc-3m||S-penfe[. 

Setting arg min„g^{am} := min{m : ^ am/,ym' G A} for a sequence {am)m^i with 
minimal value in ^4 C N, we select the dimension parameter 

m := arg min < +pen^ >. (2-7) 

The estimator of /3 is now given by (3ff^ and below we derive an upper bound for its risk. By 
construction the choice of the dimension parameter and hence the estimator do not rely on 
the regularity assumptions on the slope and the operator which we formalize in Section 3. 

2.3 Upper risk bound for the data-driven thresholded projection estimator 

The next assertion states the key argument in the proof of the upper risk-bound. 

Lemma 2.1. Let {hiasjn)m^i be the sequence of approximation errors hiasm = sup^^^H/?'^ — 
I3\\ijj. Consider an arbitrary sequence of penalties {pen^)m^i, an upper bound M G N, and 
the sequence (^'m)m>i of contrasts given by ■= max^jj^jcM |||/3fc - /3m||^ - penfj^. If the 
subsequence (peui, . . . ,penjy,j) is non- decreasing, then we have for the selected model in := 
arg min;^^^^^ {^m +V^f^m} '^^d, for all 1 ^ m ^ M that 

0fh - PWl ^ 7pen^ +78bias'i +42 max ^ (0k - P'^Wl - IpeUk] (2.8) 



m^k^M V 6 

where (a)+ = max(a, 0). 

Proof of Lemma 2.1. Prom the definition of m we deduce for all 1 ^ m ^ M that 

^ 3{^m+penm+^m+pen^+0m- /3\\l] 
^6{^m+pen^} + 30m-p\\l. (2.9) 
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Since (pen^, . . . ,penj^) is non-decreasing and 4bias^ ^ max^^jt^Mll/S*^ ~ /^"^IISj ^'^^ 
1 ^ m ^ M easily verified that 

*^^6 sup (wPk- P''\\l-]:Penk) +12biasl^. 

The last estimate allows us for all 1 ^ m ^ M to write 

0m- PWl^ I pen^ +2 biasl, +2 sup ( 0k - P'' \\l - ^ pen^ 



From the last inequality and (2.9), we obtain the assertion (2.8), which completes the proof. □ 

In addition to the last assertion the proof of the upper risk bound requires two assumptions 
which we state next. For n ^ 1 and a positive sequence a := {am)m^i denote 

Mn{a) := min |2 ^ m ^ : m A^a^ > 3^^^^} - 1 (2-10) 

where we set M„(a) := if the set is empty. Observe that M given in (2.5) satisfies M = 
Mn{a) with a = (||[r]^^||s),„^i. Consider for m ^ 1, 5^ := 5m{K) with K = (|| [rj^^lls)^^! 
and 

pen^:=Kul5l,n-^ with al := 2{m^^ + [g]'jr\^[g]rn) (2.11) 

which are obviously only theoretical counterparts of the random objects given in (2.6). In order 
to control the third right hand side term in the upper bound (2.8), the remainder term, we 
impose the following assumption, though we show in Section 3 under reasonable assumptions 
on the distribution of e and X that it holds true for a wide range of classes T and Q. 

Assumption 2.1. There exist sequences (?7i*)„^i and (M+)„^i, and a constant K\ such that 



supsupEi sup \0k- I3^\\t-\penk] \ ^ Kin ^ fi 

fS&TT&g yml^k<,M+\ ^ /+} 



or all n ^ 1. 



In the following we decompose the risk with respect to an event £n where pen is comparable 
to its theoretical counterpart pen and M lies between and given by Assumption 2.1, 
and its complement To be precise, we define the event 

Sn := {penfc ^ pSi^ ^ 72penfc, Vm^ ^ A; ^ M+} n ^ M ^ M+| (2.12) 

and consider the elementary identity 

supsupE||An-/3||' = supsupE(||An-^||' l£j + supsupE(||^ft-/3||2 l^c) (2.13) 

The conditions on the distribution of e and X presented in the next section are also sufficient 
to show that the following assumption holds true. 
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Assumption 2.2. There exists a constant K2 > such that 
sup supE(||^^ - PWl lec ) ^ K2 n"^ for all n^l. 

The next assertion provides an upper bound for the maximal J\^-risk over the classes T 
and Q of the thresholded projection estimator with data-driven choice m given by (2.7). 

Proposition 2.2. If Assumption 2.1 and 2.2 hold true, then we have 

T^uiif^mi ^1 Q\ ^ 504 sup supjpen^o + bias^o } + (504 Ki + K2)n~^ for all n ^ 1. 

Proof of Proposition 2.2. Wc make use of the elementary identity (2.13) and taking into 
account Assumption 2.2 we derive for all n ^ 1 

nSfh\ Q\ ^ sup supE(||^a - l£n ) + ^2n-^ (2.14) 

We observe that the random subsequences {a\ , . . . , ct?^) and hence {peh-y , . . . , perij^) are by 
construction monotonically non-decreasing. Indeed, for all 1 ^ m ^ k ^ M the identity 
{f0k - Xi),0k - f^m))m = [?]|[r]feM?]fc - [?]m[r]mM5]m holds true. Therefore, by using 
that r is positive definite it follows that [5]m[r]~"'^[^]m ^ [?]A;[r]^^[5]fc, and hence ^ a^. 
Consequently, Lemma 2.1 is applicable for all 1 ^ m ^ M and we obtain 

On the event 6n defined in (2.12) we deduce from the last bound that for all 1 
WPfh - PWl l£n ^ 504pen„o +78 6ms^o +42 sup (0k - P'^Wl - IpeUk] 

which by taking into account Assumption 2.1 implies that 

sup supE(||^j^ — /3||^ Ig^ ) ^ 504 sup supjpen^o +6ms^o} + 504^1 for all n ^ 1. 

We obtain the claim of the proposition by combination of the last bound and (2.14). □ 

Remark 2.1. The upper risk-bound given in the last assertion is strongly reminiscent of a 
variance/squared-biased decomposition of the Ti^-nsk associated with the estimator /3^o em- 
ploying the dimension parameter m^. Indeed, in many cases the penalty term pen^ is in the 
same order as the variance of the estimator (c.f. Illustration 3.1 [P-P] and [E-P] below). 
In this situation we obviously wish that the parameter just realize the balance between 
both the variance and the squared-biased term which in many cases can lead to an optimal 
estimation procedure. However, the construction of the penalty term is more involved to ensure 
that Assumption 2.1 and 2.2 can be satisfied (c.f. Illustration 3.1 [P-E]). □ 
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3 Minimax-optimality 



In this section we recall first a general framework proposed by Cardot and Johannes [2010] which 
allows to derive minimax-optimal rates for the maximal J^uj-iisk, sup^g^^r suppgg E||/3 — over 
the classes T and Q. Placing us into this framework, we can derive the main results of this paper 
which state that the proposed data-driven procedure indeed can attain these minimax-rates. 

3.1 Notations and baisic assumptions 

The additional regularity conditions /3 G ^ and F E G imposed on the slope function and 
the covariance operator, respectively, are characterized by different weighted norms in H with 
respect to the pre-specified orthonormal basis {ipjjj G N} in H, which we formalize now. Given 
a strictly positive sequence of weights b = {bj)j^i and a radius r > 0, let Tb be the completion 
of H with respect to the weighted norm then we consider in the following the ellipsoid 

■^b ■~ ^ -^b '■ ^ r} as class of possible slope functions. Furthermore, as usual in the 
context of ill-posed inverse problems, we link the mapping properties of the covariance operator 
r and the regularity condition (3 G Therefore, consider the sequence ((F^'j, V'j))j>i which 
sums up to E||X|||[, i.e. F is nuclear, and hence converges to zero. In what follows we impose 
restrictions on the decay of this sequence. Denote by J\f the set of all strictly positive nuclear 
operators defined on H. Given a strictly positive sequence of weights 7 and a constant d ^ 1 
define the class C J\f of covariance operators by 

g^:={TeM: \\ff,./d'^\\Tff^d'\\ff^,, V/ G m} 

where arithmetic operations on sequences are defined element-wise, e.g. 7^ = Let us 

briefly discuss the last definition. If T G Q^, then we have d~^ ^ {T^j,ipj) /^fj ^ d, for all 
j ^ 1. Consequently, the sequence 7 is necessarily summable, because T is nuclear. Moreover, 
if A denotes the sequence of eigenvalues of T then d~^ ^ ^j/lj ^ d, for all j ^ 1. In other 
words the sequence 7 characterizes the decay of the eigenvalues of T G We do not specify 
the sequences of weights oj, b and 7, but impose from now on the following minimal regularity 
conditions. 

Assumption 3.1. Let {ujj)j^i, ibj)j^i, and (7j)j>i be strictly positive sequences of weights with 
bi = 1, oji = 1, 71 = 1, and J2'jLilj < 0° such that the sequences b~^, u}b~^, 7, and 7^0;"-^ 
are monotonically non-increasing and converging to zero. 

The last assumption is fairly mild. For example, assuming that iob~^ is non-increasing, 
ensures that J-"^ C J-^^. Furthermore, it is shown in Cardot and Johannes [2010] that the 
minimax rate ii* [n; J-"^', C/^] is of order n^^ for all sequences 7 and uj such that ^("^uj^^ is non- 
decreasing. We will illustrate all our results considering the following three configurations for 
the sequences w, b and 7. 

Illustration 3.1. In all three cases, we take ujj = j^*, j ^ 1. Moreover, let 
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[P-P] bj = fP and jj = j"^", j ^ 1, with p>0, a> 1/2, and p> s> -2a; 

[E-P] bj = cxp(j2P - 1) and -fj = j'^", j ^ 1, with p > 0, a, > 1/2, and s > -2a; 

[P-E] bj = j'^P and -yj = exp(— j^" + 1), j ^ 1, with p > 0, a > 0, and p > s; 

then Assumption 3.1 is satisfied in all cases. □ 

Remark 3.1. In the configurations [P-P] and [E-P], the case s = —a can be interpreted as 
mean-prediction error (c.f. Cardot and Johannes [2010]). Moreover, if {ipj} is the trigonometric 
basis and the value of s is an integer, then the weighted norm \\h\\u corresponds to the L^-norm 
of the weak s-th derivative of h (c.f. Neubauer [1988]). In other words in this situation we 
consider as risk the mean integrated squared error when estimating the s-th derivative of /3. 
Moreover, in the configurations [P-P] and [P-E], the additional condition p > s means that 
the slope function has at least p^ s + 1 weak derivatives, while for a value p > 1 in [E-P], the 
slope function is assumed to be an analytic function (c.f. Kawata [1972]). □ 

3.2 Minirricix optimal estimation reviewed 

Let us first recall a lower bound of the maximal Ja,-risk over the classes and due to 
Cardot and Johannes [2010]. Given an i.i.d. sample of {Y,X) of size n and sequences cj, b and 
7 satisfying Assumption 3.1 define 

m* := arg min < max ( , 1 > and R* := max ( — ^ , 1 . (3.1) 

If in addition ^ := inf„^i{(i?*)~^ min(a;,„* , X^^i Wj(n7j)~^)} > 0, then there exists a 
constant C := C{a,r,d,^) > depending on a, r, d and ^ only such that 

inf^7^* g^] -^CRl for all n ^ 1. (3.2) 

On the other hand considering the dimension parameter m* given in (3.1) Cardot and Johannes 
[2010] have shown that the maximal risk Rtj[fim%^]^l-iQ^\ of the estimator defined in (2.3) 
is bounded by i?* up to constant for a wide range of sequences a;, 6, and 7, provided the 
random function X and the error e satisfy certain additional moment conditions. In other 
words i?* = i?* [n; t/^] is the minimax-rate in this situation and the estimator is 
minimax optimal. Although, the definition of the dimension parameter m* necessitates an 
a-priori knowledge of the sequences b and 7. In the remaining part of this paper we show that 
the data-driven choice of the dimension parameter constructed in Section 2 can automatically 
attain the minimax-rate i?* for a variety of sequences a;, 6, and 7. Before, let us briefly illustrate 
the minimax result. 

Illustration (continued) 3.2. Considering the three configurations (see Dlustration 3.1), 
it has been shown in Cardot and Johannes [2010] that the estimator with m* as given 
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below attains the rate i2* up to a constant. We write for two strictly positive sequences (a„)„^i 
and {bn)n^i that a„ ~ 6„, if {an/bn)n^i is bounded away from and infinity. 

[P-P] It is easily seen that ~ „i/(2p+2a+i) if 2s+2a+l > 0, ~ n^Pb-^)! if 2s+2a+l < 
and m* ~ (n/logn)^/[^(*'~*)l if 2a + 2s + 1 = 0, which in turn impHes that i?* ~ 
max(n-(2?'-2^)/(2«+2p+i)^ if 2s + 2a + 1 7^ (and R*^ ~ log(n)/n if 2s + 2a + 1 = 0). 
Observe that an increasing value of a leads to a slower minimax-rate i?* . Therefore, the 
parameter a is called degree of ill-posedness (c.f. Natterer [1984]). 

[E-P] If 2a+2s+l > 0, then m*^ ~ (logn-^ log(logn))V(2p) and R*^ ~ n-i(logn)(2«+i+2^)/(2p). 
Furthermore, if 2a + 2s + 1 < 0, then m* ~ (log n + (s/p) log(log n))^/^^^) and i?* ~ n~^, 
while i?* ~ log(log n)/n if 2a + 2s + 1 = 0. 

[P-E] We have ml ~ (logn- ^^+^^°~^^+ log(log w))V(2a)_ Thereby, R*^ ~ (logn)-(P-^)/". The 
parameter a reflects again the degree of ill-posedness since an increasing value of a leads 
also here to a slower minimax-rate i?* . □ 



3.3 Minimcix-optimality of the data-driven estimation procedure 

Consider the thresholded projection estimator Pff^ with data-driven choice fh of the dimension 
parameter. Supposing that the joint distribution of the random function X and the error term 
£ satisfies certain additional conditions, we will prove below that the Assumptions 2.1 and 
2.2 formulated in Section 2 hold true. These assumptions rely on the existence of sequences 

(m^)„^i and (M,"J').„j.i which amongst others we define now referring only to the classes 
and 0!^. Keep in mind the notations given in (2.4) and (2.10). For m ^ 1 and K = {[\7'y]m)m^i 
define Am '■= Am{K) and 6m '■= Sm{K) where Am = max ojjj^^ . Moreover, for n ^ 1 we 

set M~ := Mn{a) with a = (16(i^7~^)m^i and M+ := M„(a) with a = {[4:djm]~^)m^i- Taking 
into account these notations we define for n ^ 1 

n \ 

^ I OJm^ ^m^ \ 







< max - — . 


4)} 







and R„ := max 



"n 



bmi' n J 

satisfying ^ M~ ^ M+. Furthermore, let S := 5](^^) denote a finite constant such that 

^^v^ ,^^V^A7 f 1 mlog(A;^ V(m + 2))\ 
S^J:^, and S^^A;^exp(- — -— ) (3.3) 



^.^1 • 16(l + log(d)) log(m + 2) 

which by construction always exists and depends on the class only. Let us illustrate the last 
definition by revisiting the three configurations for the sequences u, b, and 7 (Illustration 3.1). 

Illustration (continued) 3.3. In the following we state the order of M~ and S'^ which in 
turn are used to derive the order of and R^. 

[P-P] M- - (t+^)^^^"^^^^^^, ~ mi+(2«+2^)+, and for all p > (s)+ it follows ml ~ 

^l/[l+2p-2s+(2a+2s)+ ^^^^ ^ ^-2{p-s)/[l+2p-2s+{2a+2s)+]. 
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[E-P] M- ~ {i^^) , 5m ~ mi+(2«+2^)+, and for all p > it follows ~ (logn - 

'+'^°y+-'M og(logn))V(2p) and ~ ^-i(iog„)[i+2(a+.)+]/(2p). 

[P-E] M- ~ (logn- ^+^°+^^'^+ log(logn))l/(2a)^ J7 ^ ^l+2s+2a g^p(^2a)^ ^^^^j fo^ ^ > ^^^^ 

it follows ml ~ (logn - ^^^^ log(logn))i/(2a) ^^^^ _ (logn)-(P-*)/". □ 

Wc proceed by formalizing the additional conditions on the joint distribution of e and X 
which in turn are used to prove that the Assumptions 2.1 and 2.2 hold true. 

Imposing a joint normal distribution. Let us first assume that X is a centered Gaussian 
H-valued random variable, that is, for all /c ^ 1 and for all finite collections {hi, . . . , hk\ C H the 
joint distribution of the real valued random variables (X, hi)u, • • • , {X, hk)m is Gaussian with 
zero mean vector and covariance matrix with generic elements E(/ij, X)h(X, /i/)h; ^ ^ j,l ^ k. 
Moreover, suppose that the error term is standard normally distributed. The next assumption 
summarizes this situation. 

Assumption 3.2. The joint distribution of the random function X and the error s is normal. 

The proof of the next assertion is more involved and hence deferred to Appendix G. 

Proposition 3.1. Assume an iid. n-sample of {Y,X) obeying (1.1) and Assumption 3.2. 
Consider sequences u, b and 7 satisfying Assumption 3.1 and in the definition (2.6) and (2.11) 
of the penalty pen and pen respectively set k = 96. For the classes J-^ and Q^, there exist finite 
constants C\ := Ci{d) and C2 ■= 6*2(0?) depending on d only such that the Assumptions 2.1 and 
2.2 hold true, with Ki := Ci (cr^ + r) S and K2 := C2 (cr^ + r) S respectively. 

By taking the value k = 96 the random penalty pen and the random upper bound M given 
in (2.6) and (2.5) respectively depend indeed only on the data, and hence the choice fh of the 
dimension parameter in (2.7) is fully data-driven. Moreover due to the last assertion we can 
apply Proposition 2.2 which in turn provides the key argument to prove the following upper 
risk-bound for the data-driven thresholded projection estimator with m given by (2.7). 

Theorem 3.2. Let the assumptions of Proposition 3.1 be satisfied. There exists a finite constant 
K := K{d) depending on d only such that 

B^[Prh,H^Q^\^K{a'' + r){K + T.n-^} foralln^l. 

Proof of Theorem 3.2. We shall provide in the appendix among others, the two technical 
Lemmas B.l and B.2 which are used in the following. Moreover, denote by K := K{d) a 
constant depending on d only which changes from line to line. Making use of Proposition 3.1, 
i.e.. Assumptions 2.1 and 2.2 are satisfied, we can apply Proposition 2.2, and hence for all n ^ 1 

Roj0m, si] ^ 504 sup sup {pen^o + biasl^o } + K {a'^ + r)J:n~^ . (3.4) 

Furthermore, if /3 € J-J^ and T €z then firstly from (B.4) in Lemma B.l follows that bias'^o ^ 
34 d^rwm* b^o because 7^w^^ and ujb^^ are non increasing due to Assumption 3.1. Secondly, 
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by combination of (i) and (iv) in Lemma B.2, it is easily verified that pen^o ^ K {a'^+r)6^on~^ . 
Consequently, sup^gjrr suppggd{pen^o + 6ias^o } ^ K (a^ + r) for all n ^ 1 by combination 
of the last two estimates and the definition of which in turn together with the upper bound 
(3.4) implies the assertion of the theorem. □ 

Imposing moment conditions. We dismiss now the Assumption 3.2 and formalize in place 
conditions on the moments of the random function X and the error term e. In particular we 
use that for all ^ G H with {Th,h) = 1, the random variable {h,X) is standardized, i.e. has 
mean zero and variance one. 

Assumption 3.3. There exist a finite integer k ^ 16 and a finite constant ^ 1 such that 
Elel^*^ ^ r]^^ and that for all h E M with {Th,h) = 1 the standardized random variable {h,X) 
satisfies E\{h, X)\^'' ^r]^''. 

It is worth noting that for any Gaussian random function X with finite second moment 
Assumption 3.3 holds true, since for all h with (F/t, h) = 1 the random variable {h, X) is 
standard normally distributed and hence E|(/i, = 1)-. . .-S-S-l. The proof of the next 

assertion is again rather involved and deferred to Appendix D. It follows, however, along the 
general lines of the proof of Proposition 2.2 though it is not a straightforward extension. Take 

1 /2 ^ ^ 

as an example the concentration inequality for the random variable ||[r]„( ([ff]m ^ [r]m[/3™]m)|| 
in Lemma C.3 in Appendix C which due to Assumption 3.2 is shown by employing elementary 
inequalities for Gaussian random variables. In contrast, the proof of an analogous result under 
Assumption 3.3 given in Lemma D.3 in Appendix D is based on an inequality due to Talagrand 
[1996] (Proposition D.l in the appendix states a version as presented in Klein and Rio [2005]). 

Proposition 3.3. Assume an iid. n-sample of {Y,X) obeying (1.1) and Assumption 3.3. 
Consider sequences oj, b and 7 satisfying Assumption 3.1 and in the definition (2.6) and (2.11) 
of the penalty pen and pen respectively, set k = 288. For the classes and Q^, there exist 
finite constants C\ := Ci{a,r],Tl,Q^) depending on a, rj and the classes J-^ and only, 
and C2 ■= C2{d) depending on d only, such that Assumptions 2.1 and 2.2 hold true with 
Ki := Ci rf^ (cr^ + r) S and K2 := C2 rf^ (a^ + r) S respectively. 

We remark a change only in the constants when comparing the last proposition with Propo- 
sition 3.1. Note further that we need a larger value for the constant n than in Proposition 3.1 
although it is still a numerical constant and hence the choice m given by (2.7) is again fully 
data-driven. Moreover, both values for the constant k, though convenient for deriving the the- 
ory, may be far too large in practice and instead be determined by means of a simulation study 
as in Comte et al. [2006], for example. The next assertion provides an upper risk-bound for the 
data-driven thresholded projection estimator fiff^ when imposing moment conditions. 

Theorem 3.4. Let the assumptions of Proposition 3.3 be satisfied. There exist finite constants 
K := K{d) depending on d only and K' := K'{a,r],J^l,Q!^) depending on a, 77 and the classes 
Fl and only such that 

RSm, ^7] ^K{a^ + r) {< + K' r?^^ S n-^} for all n^l. 
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Proof of Theorem 3.4. Taking into account Proposition 3.3 rather than Proposition 3.1 we 
follow line by line the proof of Theorem 3.2 and hence we omit the details. □ 



Minimax-optimality. A comparison of the upper bounds in both Theorem 3.2 and Theorem 
3.4 with the lower bound displayed in (3.2) shows that the data-driven estimator attains 
up to a constant the minimax-rate i2* = mini<gm<oo |™^^ (^^i X^j^i j^t")} o^ily if — 
™i'^i<m<M" |max^^,^^| has the same order as i?,*. Note that, by construction, 5m ^ 
X]j=i ^ m ^ 1. The next assertion is an immediate consequence of Theorem 3.2 and 

Theorem 3.4 and we omit its proof. 

Corollary 3.5. Let the assumptions of either Theorem 3.2 or Theorem 3.4 be satisfied. If in 
addition := sup,„^]^{i?^/i?* } < oo holds true, then we have for all 1 

where the infimum is taken over all possible estimators /3 and C is a finite positive constant. 

Remark 3.2. In the last assertion = sup^^i{R^/ R^} < oo is for example satisfied if 
the following two conditions hold simultaneously true: (i) m* ^ M~ for all n ^ 1 and (ii) 
AZi = maxisgjsgm Wj7j"^ ^ Cm~^ Y^'JLi'^jlJ^ and \og{AZi V (m + 2)) ^ Clog(m + 2) for all 
m ^ 1. Observe that (ii) which implies 5Zi ^ C'X^oli ^ is satisfied in case is in the order 
of a power of m (e.g. Illustration 3.2 [P-P] and [E-P]). If this term has an exponential order 
with respect to m (e.g. Illustration 3.2 [P-E]), then a deterioration of the term 5Zi compared 
to the variance term 'Y^=\ ^ i^ possible. However, no loss in terms of the rate may occur, 
i.e., < oo, when the squared-bias term Um^bZ^^ dominates the variance term n~^(5Z,o (for a 

n iit"fi ''•'n. 

detailed discussion in a deconvolution context we refer to Butucea and Tsybakov [2007a,b]).n 

Let us illustrate the performance of the data-driven thresholded projection estimator Pff^ 
considering the three configurations for the sequences oj, b, and 7 (see Illustration 3.1 above). 

Proposition 3.6. Assume an iid. n-sample of{Y,X) satisfying (1.1) and let either Assump- 
tion 3.2 or Assumption 3.3 hold true where we set respectively k = 96 or k = 288 in (2.6). 
The fully data-driven estimator Pf^ attains the minimax-rates i?* (see Illustration 3.2), up to a 
constant, in the three cases introduced in the Illustration 3.1, if we additionally assume a+s ^ 
in the cases [P-P] and [E-P]. 

Proof of Proposition 3.6. Under the stated conditions it is easily verified that the assump- 
tions of either Theorem 3.2 or Theorem 3.4 are satisfied. Moreover, the rates i?* (Illustration 
3.2) and i?^ (Illustration 3.3) are of the same order if we additionally assume a -|- s ^ in the 
cases [P-P] and [E-P]. Therefore we can apply Corollary 3.5 which implies the assertion. □ 

Appendix 

This section gathers preliminary technical results and the proofs of Proposition 3.1 and 3.3. 
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A Notations 



We begin by defining and recalling notations to be used in all proofs. Given m ^ 1, 
denotes the subspace of H spanned by the functions {V'l, • • • iV'm}- and II^ denote the 
orthogonal projections on and its orthogonal complement H^, respectively. If K is an 
operator mapping H to itself and if we restrict 11^X11^ to an operator from to itself, 
then it can be represented by a matrix [K]rn with generic entries {ipj, Ki/;i)^ =: [K]j^i for 
1 ^ J) ^ ^ The spectral norm of [K]m is denoted by ||[i^]y„||<j and the inverse matrix of [7^]^ 
by Furthermore, keeping in mind the notations given in (2.4) and (2.10) we use for all 

m ^ 1 and n ^ 1 

A- = A^([V.]), A^ = A^([r]), := + = mA^ A^ = <5^([r]), 

A^ = max ujjr^ = A^([V^]), A;^ := ^''^fzl^^9^'^^^ ^ = = S^r^Hv^]), 

A„ = A^([r]), Am ,= lQg(^"^ V(m + 2)) Sm:=mAmAm = 6mm), 

Iog(m + 2) 

M = M„((||[r]-i||,)^^i), M- = M„(16ciV'), M+ = Mni[Mj]-'), 

pen^ = Ka^mA^A^n'^ and pen^ = MKa^mA^A^n"^ (A.l) 

Recall that [f]„ = i Er=i[^i]m[^i]k ^nd [gU = I YTi=x y^\Mrn where [F]^ = E[X]„[X]f^ 
and \g\m = IEy[X]„j. Given a Galerkin solution G Hm, m ^ 1, of equation (1.2), let 
Zrr, := Y~{r,X)M = ae+{P-P"\X)M, and denote := EZ^ = a2+(r(/3-/3™), (/3-/3™))h, 
(Ty := Ey^ = cr^ + {VP, and = 2((Ty + [5]^[r]~^[5']m) where we used that e and X are 
uncorrelated. Define the random matrix [Sj^, and random vector [W]rn respectively by 

[^]rn := [r]„^/'[r]„[r]-V2 _ [Id]„, and [W]rn := [g]rn " [fUP"" 



where E[S]„ = 0, because E[r]^ = [F]^, and E[W]rn = - P'^)]m = 0. Define further 
dy '■= Y17=i events 

^m,n ■■= {m^Ws ^ n}, Um,n := {8||[S]^|U ^ 1}, 

An := {1/2 < < 3/2}, Bn := < 1/8, VI < < M^;'}, 

Cn := {[VF]|[r]^MH^]fc ^ g([4[r]^^b]fc + 4),vi ^ ^ m-}, (a.2) 

and their complements -^n' '^tp ^'^d C^j, respectively. Furthermore, we will denote 

by C universal numerical constants and by C(-) constants depending only on the arguments. 
In both cases, the values of the constants may change from line to line. 

B Preliminary results 

This section gathers preliminary results where we only exploit that the sequences lo, b and 7 
satisfy Assumption 3.1. The proof of the next lemma can be found in Johannes and Schenk 
[2010]. 
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Lemma B.l. Let T E Gj where the sequence 7 satisfies Assumption 3.1, then we have 

sn^{^^\\[T\-^\\s}^U\ (B.l) 
s^p||[V^]^'[^]-MV7]^'lU^4d^ (B.2) 

mGN 

^M\[yi]'^'^[nrn[yi]nl'% ^ d. (B.3) 

Let in addition fi G Fl with sequence b satisfying Assumption 3.1. If 13"^ denotes a Galerkin 
solution of g = then for each strictly positive sequence w := {wj)j^i such that w/b is non 
increasing we obtain for all m eN 

\\P-P"^\\l<^34d\^max(l,^ max (B.4) 

\\P^\\l^34d\ and \\T'/\P - Dfm ^ Md\ jmb'' . (B.5) 
Lemma B.2. Let D := (4(i-^) and let Assumption 3.1 be satisfied. IfTGQ^ then it holds 

(i) d-' < 7m||[ryiU ^ D, d-i ^ Al,/A7n ^ D, (l + logd)-i ^ h^JKl, ^ (l + logL»), and 
d-^{l + \ogd)-^ ^ 51^1 5]n ^ D{l + \ogD) for all m^l, 

(ii) (5] , ^ n4D(l + \ogD) and 5\.+ ^ nAD'^{l + 21ogD) for all n^l, 
(Hi) n ^ 2max^<^<^+||[r]-i|| ifn^2D and A<;;^+M+(l + logn) ^ SD"^. 

If in addition P E J^^ then we have for all 1 

(iv) /9^^a^^2(a2 + 35d9r) 

Proof of Lemma B.2. Proof of (i). Due to (B.l) and (B.3) in Lemma B.l, we have for all 
r G and for all m ^ 1 that ||[r]-i||s ^ ^d^l^^ and 7"^ ^ d||[r]-i||,. Thus, given D = {M^) 
for all m ^ 1 we have d~^ ^ Il[r]^^||s7m ^ D- Moreover, the monotonicity of 7 implies 
d"^ ^ 7Mmaxi^^^M||[r]^''^||s ^ D. Prom these estimates we obtain (i). 

Proof of (ii). Observe that A']' , ^ A"^ . 77,+ . In case M+ = 1 the assertion is trivial, 
since Ay7-|~^ = 1 due to Assumption 3.1. Thus, consider ^ > 1, which implies 

min^^^.^^+{7j(jAp-i} ^ (l+logn)(4L>n)-i, and hence M+A]^+ ^ ADn{l+logny\ A]^+ ^ 
(l + logZ))(l + logn), M+A^^+ ^ 41)2^(1 + log n)-i and , (1 + 2 logD)(l + logn). The 
assertion (ii) follows now by combination of these estimates. 

Proof of (iii). By employing that Dl^+ ^ ™a'Xj^^^^^+||[r]^j^||, the assertion (iii) follows in 
case M+ = 1 from 71 = 1, while in case M+ > 1, we use M+A'^ . 77^+ ^ 4Dn{l + logn)~^. 
Proof of (iv). Since e and X are centered it follows from [/3™]rri = [r]^^[(7]m that ^ 
2(Ey2 +E|(/3"^,X)Hn = 2(4 + [gfjj:]^[g]rn) = (^l- Moreover, by employing successively 
the inequality of Heinz [1951], i.e. Hr-*^/^/?!!^ ^ and Assumption 3.1, i.e., 7 and b~^ are 

non-increasing, the identity ay = o''^ + (P/S, P)m implies 

^ + dWPf^ + dr. (B.6) 
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Furthermore, from (B.3) and (B.5) in Lemma B.l we obtain 

[gU^&]ni^d\\n\'^^^d'r. (B.7) 
The assertion (iv) follows now from (B.6) and (B.7), which completes the proof. □ 
Lemma B.3. For all n, m ^ 1 we have 

|| < ^ 4,V1 ^ m ^ M-| C {m- ^ M ^ M+}. 

Proof of Lemma B.3. Let = ||[f]^^||^^ and recall that l^M^M!;!^ with 

/ TM+i ^ 1+logn \ A//" — 1 



mm 



Given Tm '■= || [r]^^ H^"*^ we have D^^ ^ Tm/jm ^ d, m ^ 1 due to (i) in Lemma B.2 which we 
use to prove the following two assertions 

\m<M~}c\ min : — <H, (B.8) 

\m > M+) C I max —^41. (B.9) 

Obviously, the assertion of Lemma B.3 follows now by combination of (B.8) and (B.9). 
Consider (B.8) which is trivial in case M~ = 1. If M~ > 1 we have min > 4D(i+iogn) 

and, hence min —^5- ^ ^(^+^"g"-) _ gy exploiting the last estimate we obtain 

M--1 



|m < m;^} n |m < M"} = IJ |m = m} 

M=l 

C ^" I ^ I "^^+1 < 1 + ^og \ = / inin ^ 1 + log 'T- 1 

Hl(^ + l)A'^+i ri i~\2^T^M-mA^ n j 

( 



C <{ min _ — < 1/4 



while trivially |m = M;;;'| n |m < M^j = 0, which proves (B.8) because M" ^ M^. 

(M++l)A'- 



Consider (B.9) which is trivial in case M+ = M^. If M+ < M^, then ^,^+^^f+^ < 
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and hence 

{m > 1} n {m > M+} = IJ |m = m} 

M=M++1 

^ I 1 f . . l + logn) I . Tm . 1 + logn 

C II <^ mm — — ^ } = < mm — — ^ 



M=M++1 



2<m<(M++l) mA^;^; 



n 



C 

rM++i J 

while trivially {M = 1} n {M > M+} = which shows (B.9) and completes the proof. □ 

Lemma B.4. Let An, Bn and Cn as in (A. 2). For all 1 it holds true that 

A n H„ n C„ C {pen^ ^ pShj, ^ 72penk, l^k^ M^} n {M' ^ M ^ M+}. 

Proof of Lemma B.4. Let ^ A; ^ 1. If ^ 1/8, i.e. on the event Bn, it is easily 

verified that ||([Id]fc + — [IdJ^H^ ^1/7 which we exploit to conclude 

6/711 [v.]f[r]^Hv.]flU ^ ll[v.]f [f]^^[v.]f II. ^ 8/7||[v.]f [r]^Mv.]f lU, 

6/711 [r]^^IU^ II [r]^^IU< 8/711 [r]^i|U and 
6/7x*[r]-^x ^ x*[f]^ ^x- ^ 8/7x*[r]-^x, for all x G M'^, (B.IO) 

and, consequently 

(6/7)[?]l[r]^M?]A, ^ [9]ini\9]k ^ {m)[gmt\9\k. (B.ll) 

Moreover, from ||[S]^||s ^ 1/8 we obtain after some algebra. 

Combining each of these estimates with (B.ll) yields 

(7/8)[?]i[f],i[?],<(33/16)b]i[r],i[5], + 4[W^]fe[r],i[Ty],. 

If in addition [PF]fc[r];^^[VF]fe ^ Mflfc + f^y)j i-^-, on the event C„, then the last two 

estimates imply respectively 

(7/16)([4[r]^ib]fc + 4) ^ (15/16)4 + (7/3)[?]|[f]^^[^]fc, 

(7/8)[?]|[r]-n?]fe ^ (4i/i6)b]|[r]-n5]. + (1/2)4, 
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and hence in case 1/2 ^ ay/a-y ^ 3/2, i.e., on the event An, we obtain 

(7/16)(b]i[r],M5]fe + 4) ^ (15/8)a^ + (7/3)[?]|[f]^i[ff],, 

iVmmirjk'idk + ^y) ^ (4i/i6)b]i[r]^i[5], + (29/16)4. 

Combining the last two estimates we have 

lmW]k'[9]k + 2a^y) ^ {2[g]i[f]^Hd]k + 2^Y) ^ S{2[g]i[r]l^[g]k + 2a'y). 
Since on the event AnHSnriCn the last estimate and (B.IO) hold for all 1 ^ ^ it follows 
AnnBnnCnC {{l/6)ai ^ai^ Sal, and (6/7) ^ Am ^ (8/7) A^, VI ^ m ^ il^}. 

FVom = '"^^itgTmSr^^ ^* ^"^"^^ ^^^"^ (^/"^^ ^ ^m/A^ ^ (8/7) impHes 
1/2 < (1 + log(7/6))-i < A^/Aj; ^ (1 + log(8/7)) ^ 3/2. 

Taking into account the last estimates and the definitions pen^ = Kcr^mAjJjAjJjn"^ andpen^ = 
14«;a^raA^A^n~^ we obtain 

AnDBnnCnC {pen^ < pefim < 72pen„, VI < m < M^}. (B.12) 

On the other hand, by exploiting successively (B.IO) and Lemma B.3 we have 



A: 



„n^„nCnc|^^^3^^^,vi^m^M;^| c{M-^iif^M+}. (B.13) 

From (B.12) and (B.13) follows the assertion of the lemma, which completes the proof. □ 

Lemma B.5. For all m, n ^ 1 with n ^ (8/7)||[r]~-^||s we have Um,n C ^m,n- 

Proof of Lemma B.5. Taking into account the identity [T]^ = [r]^^{[Id]rrj, + [E;]m,}[r]^^ we 
observe that ||[E]]^||s ^ 1/8 implies ||[r]~^||s ^ (8/7)||[r]~-'^||s due to the usual Neumann series 
argument. If n ^ (8/7)||[r]~-'^||s, then the last assertion implies Um,n C ^m,n, which proves the 
lemma. □ 

C Proof of Proposition 3.1 

We will suppose throughout this section that the conditions of Proposition 3.1 are satisfied and 
thus Assumption 3.1 particularly holds true which allows us to employ the Lemmas B.1-B.5 
stated in Section B. Moreover, we show first technical assertions (Lemma C.l- C.5) where we 
exploit Assumption 3.2, i.e. X and e are jointly normally distributed. They are used below to 
prove that the Assumptions 2.1 and 2.2 are satisfied (Proposition C.6 and C.7 respectively), 
which is the claim of Proposition 3.1. 

We begin by recalling elementary properties due to the Assumption 3.2 which are frequently 
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used in this section. Given / G H the random variable (/, X)h is normally distributed with 
mean zero and variance (Ff, /)h- Consider the Galerkin solution and h G Mm then the 
random variables (/? — /S™, X)]h[ and {h, X)e are independent. Thereby, Zm = Y— {/3'^, X)n = 
ae + {(3 - (3"',X)u and [X] jn are independent, normally distributed with mean zero, and, 
respectively, variance and covariancc matrix [T]„i- Consequently, {p^Zm, [XYmlrjm^"^) is 
a (rn + l)-dimensional vector of iid. standard normally distributed random variables. Let us 
further state elementary inequalities for Gaussian random variables. 

Lemma C.l. Let {Ui,Vij,l ^ i ^ n, 1 ^ j ^ m} be independent and standard normally 
distributed random variables. Then we have for all r] > and ^ ^ ^m/n 



P n-i/2^(i72-l) ^r? ^exp 



P\ n 



n 



1 



8 1 + ?7 n" 



-1/2 



1=1 



^^1/2 exp(^--mm{7? ,1/4} 



1=1 



>C\ ^exp(— )+exp(— ); 



(C.l) 
(C.2) 

(C.3) 



and for all 1 and ai, . . . , am ^ that 



E(Et/2-2cnJ ^16exp(^) 



^i=l 



(m 



n 



i=l 

n 



1 ,—cm. cm /—n. 

-4cm I ^I6exp(— )+32— exp(— ) 



i=l 



n n 



(m m \ 

.7 = 1 .7 = 1 ^ 



(C.4) 
(C.5) 

(C.6) 



Proof of Lemma C.l. Define W := ELi and Z^- := (EILi ^?)"^^^ Er=i f^i^^i- Obvi- 
ously, W has Xn distribution with n degrees of freedom and Zi, . . . , Zm, given Ui, . . . ,Un are 
independent and standard normally distributed, which we use below without further reference. 
Prom the estimate (C.l) given in Dahlhaus and Polonik [2006] (Proposition A.l) follows 





n 




[n 2 


1=1 





^ri^ ^P{n-'W^2)+E[P{2n-^\Zi\^^r]^\Ui,...,Un)] 



^ exp 



n 
16 



+ 



exp 



2 

T] n 



which implies (C.2). The estimate (C.3) follows analogously and we omit the details. By 
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employing (C.l), 2c — 1 ^ c and n ^{cn + 1) ^ 1 we obtain (C.4). Indeed, 

(n \ noo / n \ 

J2ui-2cnj = j PU-^l''Yl{Uf-l)^n-^l\cn + t)\dt 

7o V 8 1 + n ^ 



, ,dt^ I exp I - — (cn + t)]dt 
{cn + t)J ^ Jo 16^ ^' 



exp 



cn\ 



t \ , I cn 

-jdt = 16exp( 



From the last estimate and (C.l) follows (C.5), because 



HE 



n 



icm 



cm 



Ui,...,Un 



+ 2cmn~^{W -2n) 



+ 



^ 16 exp - —yyu-^W] + 32 — exp - - J 



It remains to prove (C.6) which can be realized as follows (keep in mind that E[l^^] = n(n+2)) 



/ m 


n 


2\ 2 








1 = E 






i=l 







3=1 



Ul,...,Un 



□ 



Lemma C.2. For all n,m ^ 1 we have 

Furthermore, there exist a numerical constant C > such that for all 1 

max P ~ > 

ls;ms;[ni/4J \ 16 I 

max P(||[H]„||, > 1/8) ^ C; 
n'^P {{1/2 ^ a^/a^ ^ 3/2}^) < C. 



(C.7) 



(C.8) 
(C.9) 
(C.IO) 



Proof of Lemma C.2. Let n, m ^ 1 be fixed, denote by (Aj, ej)i<gj<gm an eigenvalue decom- 
position of [r]rn- Define Ui := {aSi + (/3 - /3™, X,i)M)/pm and Vij := (AJ^^^e* 1 ^ i ^ n, 
1 ^ j ^ m, where C/i, . . . , Un, Vn, . . . , l/nm are independent and standard normally distributed 
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random variables. 

Proof of (C.7) and (C.8). Taking into account XljLi -^j ^ ^II^IIh ^^'^ identities n^p^\\ [^]m||^ 
= iET=l>^JiE^=lU^V^,ff and i[WUT]^nW]rn) / pl = E,"!! (E?=i f^^^^.)' the asser^ 
tions (C.7) and (C.8) follow, respectively, from (C.6) and (C.3) in Lemma C.l (with aj = 
Proof of (C.9). Since n||[S]m,||s ^ 'm'^^^i^j,i^m\Yl^=i(^ij^il ~ obtain due to (C.l) 

and (C.2) in Lemma C.l that for all 77 > 

n 

P{\\[E]rn\\s^v)^ E P{\n-'^{V^jVa-d,i)\^rj/m) 

^^j,l^rn i=l 

{n n 
P{\n~^^yayi2\ > ^/m),P{\n-^'^^{V^i - 1)| ^ n^/^rj/m) I 
i=l i=l ) 

^ ma^ I (1 + ^) exp - ^ min {r^^/m\ 1/4}) , 2 exp - ^ ll^Tm) } ' 
Moreover, for all ^ m/2 the last bound simphfies to 

1 nrj^ 



( 2m 1 
-P(ll["]mlU ^ ??) ^ rn^ maxjl + — ^,2 j exp 



12 



and it is easily seen that the last bound implies (C.9). 

Proof of (C.IO). Since Yi/aY, ■ ■ ■ , Yn/dY are independent and standard normally distributed, 
by exploiting that {1/2 ^ dy/'^Y ^ ^f^V ^ EILi ^iVo"y - 1| > 1/2}, (CIO) follows 

from (C.l) in Lemma C.l, which completes the proof. □ 

Lemma C.3. We have for all c'^ 1 and n, m ^ 1 

4cm| ^ 16exp(— ) +32— exp(-). 

Proof of Lemma C.3. The assertion follows from (C.5) in Lemma C.l and the identity 
n\\ [r]m ^^[H^]m|| = Ej"li(""^/^ TJl=i UiVijf derived in the proof of Lemma C.2. □ 

Lemma C.4. There exists a constant C{d) only depending on d such that for all n'^ 1 
sup sup J2 ^Mmm^'m.-ial^) ^C{d){a' + r)J^n-\ 

Proof of Lemma C.4. The key argument of the proof is the estimate given in Lemma C.3 
with c = A^. Taking into account this upper bound and that for all B G and T £ Q'^ 
the estimates A[ ^ "^d^^l, (1 + logd)-iA^ ^ A^, ^ nCd'^{l + logd) (recall that 5^ = 
mAj^Aj^) and ^ cr^ ^ 2((7^ + 35d^r) (Lemma B.2 (i), (ii) and (iv) respectively) hold true. 
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we obtain 



X; AlK([wmi'[W],-4al^^' 

k=m^ ^ 

M' 



n 



^ C{d){a' + r) | J] A^xp - ^^^^_iL_ j + m+ exp (-n/16) 

Finally, exploiting that the constant S satisfies (3.3) and that exp (—n/16) ^ C for all 
n ^ 1 we obtain the assertion of the lemma, which completes the proof. □ 

Lemma C.5. There exist a numerical constant C and a constant C{d) only depending on d 
such that for all n ^ 1 we have 

sup sup \n%M+f max P(^^,„) ] ^ C; (C.ll) 

sup sup {nM+ max P(i^^,„) | ^ C(d); (C.12) 
sup sup{n^P(£:^)} ^ C. (C.13) 

Proof of Lemma C.5. Since M+ ^ [n^/^^J and ?J^_„ = > 1/8} the assertion (C.ll) 

follows from (C.8) in Lemma C.2. Consider (C.12). Let := no{d) := exp(128d6) ^ 8d^, 
and consequently A't' i (M^ log n) ^ 128d^ for all n ^ tIq. We distinguish in the follow- 
ing the cases n < rio and n ^ tIq. First, consider 1 ^ n ^ tTq. Obviously, we have 
M+ maxj^<^^jy^+ P(r2$^ ,j) ^ M+ ^ n~-'^no^^ ^ C{d)n~^ since M+ ^ n-*^/^ and rio depends on d 
only. On the other hand, if n ^ no then from Lemma B.2 (iii) follows n ^ 2 maxj^^^^^+ 1| [F]"^ || , 
and hence I3m,n C ^m,n for all 1 ^ m by employing Lemma B.5. From (C.ll) we con- 

clude M+ max^^^^^+ P{n^,^J ^ M+ max^^^^j^^+ P(?J^,„) ^ Cn'^. By combination of the 
two cases we obtain (C.12). It remains to show (C.13). Consider the events An, Bn and Cn 
defined in (A. 2), where yl„ni3„nCn C £n due to Lemma B.4. Moreover we have n'^P(^^) ^ C, 
n^P(i3^) ^ C, and n^P(C^) ^ C, due to (C.IO), (C.9) and (C.8) in Lemma C.2 respectively 
(keep in mind that [nV^J ^ and 2{a^ + b]fc[r]^ ^[5]^) = <jI ^ pi). Combining these 
estimates we obtain (C.13), which completes the proof. □ 

Proposition C.6. Let k = 96 in the definition of the penalty pen given in (2.11). There exists 
a constant C (d) such that for all 1 we have 

supsupeJ sup {Wk-P''\\l-\penk] \ ^ C{d){a^ + r)^ n-\ 
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Proof of Proposition C.6. We use the identity [At - P% = [rj^ M^lfc ^nk,„ In^ „, 
and obtain 

m -P'\\l = \\ [V.]f[f]l'[W],f +mi lni^„ ■ (C.14) 

Exploiting further \\{[ld]k + [^kr%lu,,„ ^ 2, the identity [f]k = l^T IMk + [^Wm^J^ 
and the definition of it follows that || [Va;]fe^^[r]^M^^]fef ^ 4Ar|| [r]^^/^[W^]fef . On 
the other hand, we have || [Va)]fc''^[r]fe """[W^JfeP If^^ „ ^ ^fe'^^ll Prom these estimates and 

ll/S'^llo) ^ WP'^Wb {oi!b~^ is non-increasing due to Assumption 3.1) we deduce for all A; ^ 1 

Taking into account this upper bound, the notations A^ and given in (A.l), and the 
definition penj^ = 96cr^A;A|^A^ri~^ we obtain for all P & J-'l and T e that 



sup (0,- p^Wl -Ipen,) A^E^ ||[r]-V^[W^],f - 4ai ' 



M+ M+ 

Consider the second and third right hand side term. By exploiting, respectively, (C.7) in 
Lemma C.2 and (B.5) in Lemma B.l together with ^ 2{a'^ + 35dPr) (Lemma B.2 (iv)) these 
two terms are bounded by 

6((72 + 35dME||X||VM+ max (P(^^^J) + 34dVM+ max Pin^J. 



Combining this upper bound, the property E||A|p ^ '^S7>i7i ^ dT, and the estimates given 



in Lemma C.5 we deduce for all /3 £ Fl and T & that 



sup sup eJ sup - - \pen^ \ ^ C{d){a^ + r)S n-^+ 



4 sup sup AiE(||[r]-V2r],f-4a^^) 

The result of the proposition follows now by replacing the last right hand side term by its upper 
bound given in Lemma C.4, which completes the proof. □ 

Proposition C.7. Let k = 96 m the definition oj pen and pen given in (2.11) and (2.6) 
respectively. There exists a constant C (d) such that for all 1 we have 

sup sup E(||An - I^Wl ^ C(d)(a2 + r)E n'^ 
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Proof of Proposition C.7. Taking into account the decomposition (C. 14) and the estimate 
ll[Va;]i^^[r]^n^]fef ^ A;^n2||["W^]fef given in the proof of Proposition C.6 we conclude 

\\A - f3\\l ^ 2Atn^ [W]kf + 2\\/3''\\l + for all k ^ 1. 

By exploiting (B.5) in Lemma B.l together with ||/3*^||a; ^ ll/^'^llb {ujb^^ is non-increasing due 
to Assumption 3.1) we obtain for all /? G J^J^ and T ^ and for all fc ^ 1 that 

II A - PWl ^ 2A%n^\\ [W]kf + 2(34dV + r). 

Since 1 ^ fn ^ and maxi^^^Mj^ ^ n it follows for all P & J^l and T e that 

mdrn - PWl ^ 2n3M- max (E|| [WMY>i£n)\'^' + 2(34dV + r)M-P(i:;j). 

Prom (C.7) in Lemma C.2 together with ^ 2(cr2 + SSd^r) (Lemma B.2) and ]E||X|p ^ 
we conclude for all /3 G J"^ and P G that 

E(||4-/3||2,1^S) ^ 12(^2 + 35dMdSn2M-|P(^;^)|V2 + 2(34dV + r)M-P(^:^). 

The result of the proposition follows now from ^ [n^/'^J and by replacing the probability 
P{£n) by its upper bound Cn~'^ given in Lemma C.5, which completes the proof. □ 

Proof of Proposition 3.1. The assertion follows from Proposition C.6 and Proposition C.7 
and we omit the details. □ 



D Proof of Proposition 3.3 

We assume throughout this section that the conditions of Proposition 3.3 are satisfied which 
allows us to employ the Lemma B.1-B.5 stated in Section B. We formulate first preliminary 
results (Proposition D.l and Lemma D.2- D.5) which rely on the moment conditions imposed 
through Assumption 3.3. They are used below to prove that the Assumptions 2.1 and 2.2 are 
satisfied (Proposition D.6 and D.7 respectively), which is the claim of Proposition 3.3. We 
begin by gathering elementary bounds due to Assumption 3.3. Let k be given by Assumption 
3.3 then for all m ^ 1 we have 

EjZ^I^'^ ^ plrj''\ E\Y\'' ^ atl'r,^^ m^ E|([r]-V2[x]„),f ^ r^^\ 

E\{p - /3-,x)h|'' < ||rV2(/3- - /3)||4V^ E|[x]^[r]-Mx]^|''= < mV'- 

Moreover, if F is a non negative random variable with KV'' < oo then the elementary inequality 
EV l{v^t} ^ t~'''^^EV'^ holds true for alH > 0. Taking into account this estimate we obtain 
under Assumption 3.3, that for all m, n ^ 1 

l{N>ni/6} ^ 

E|(^ - I3'^,X)m\^ l{|(/3-/3'",x)H|>||ri/2(^m_^)l|jj„i/6} ^ ?7^^||r^/^(^"' - /3)||Hn"^, 
E\[X]l[r]^^[XU' l{[x]*,[r]^M^k>W/3} ^ v''m'n-''/^ (D.l) 
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and by employing Markov's inequality 

We exploit these bounds in the following proofs. Moreover, the key argument used in the proof 
of Lemma D.3 is the following inequality due to Talagrand [1996] (see e.g. Klein and Rio 
[2005]). 

Proposition D.l (Talagrand's Inequality). Let Ti, . . . ,Tn be independent -valued random 
variables and v* = (l/n) J27=i [^3(^4) — IE[z^s(Tj)]] , for Vg belonging to a countable class {vg '■ 
s € S} 0/ measurable functions. Then, for e > 0, 



e( sup|z/;p -2(l + 2£)//2 ] 



+ 



\n V n^C^(e) h 

with Ki = 1/6, K2 = l/(21\/2), C(e) = ^/l + e — 1 and C a universal constant and where 



sup sup 1 1^5 (f) I ^ /i, E 

seS teT 



sup 1 



1 " 

^ H, sup - V Var(z/,(ri)) ^ v. 



Lemma D.2. There exist a numerical constant C > such that for all 1 

n2supp-^E||[W^]^f ^ Cr^\nX\\l?\ (D-3) 
Mwmfm^ > 1^ < C,«; (D.4) 

Km^L"i/4j \^ Pin ley 

max P(|| > 1/8) ^ C(r?); (D.5) 

nV({l/2 ^ af^/4 ^ 3/2y) ^ Crf\ (D.6) 

Proof of Lemma D.2. Let n, m ^ 1 be fixed, denote by (Aj, ej)i^j^m an eigenvalue decom- 
position of [r]m- Define Ui := {aSi + {13 - , Xi)u) / pm and Vij := (AJ^^^e* [Xjjm), 1 ^ z ^ n, 
1 ^ j ^ m. Keep in mind that E\Ui\'^'' ^ r?'^'', E 114^1^'= ^ r]^'' and El^Jiy^jp ^ r/''^ for some 
k ^ 16 due to Assumption 3.3 and UiVij, . . . ,UnVnj are independent and centered random 
variables for all 1 ^ j ^ m. 

Proof of (D.3) and (D.4). Consider the identities nVm^||[VF]™f = (EJli Ai(EILi t^iVij)2)2 
and m]U^]m[W]rn)/pl^ = ^7=1 (Er=i ^i^ii)' • We apply successively Minkowski's (re- 
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spectively Jensen's) inequality and Theorem 2.10 in Petrov [1995], which leads to 



m n 2 r m / n \ l/2-| 2 

j=l i=l '-j=l ^ i=l ^ - 



max ( E\UiVij\ 



1/2-1 2 



.7 = 1 



m 



j=l i=l j=l 1=1 



\2k 



^ C{k)m-^y max E| [7^1/^.1^'' ^ C(fc)r?' 

^ ^ 1 <.i<C'n ' ' 



The first estimate implies (D.3) since X^^j^ Aj ^ E||X|||[. By employing Markov's inequality 
the second estimate with A; = 16 impHes (D.4), that is 



max P 
l^m^[ni/4J I 



Pm 



\m 1 

^ 16 



^ Cn-^S^^ max m^^ ^ Cn'^'^rf^. 

l^m^[ni/4J 



Proof of (D.5). Since VijVn — Sji, . . . , VnjVni — Sji are independent and centered random vari- 
ables with E|Vij\^; — Sjil'^'^ ^ C??^'^ for all 1 ^ j, Z ^ m it follows from Theorem 2.10 in 



Petrov [1995] that n'^E\n X]r=i(^ii^«« ~ <^i«)|^'^ ^ C(A;)t?^'^. By employing the elementary 



inequality 
we conclude 



\yijVil — ^jll'^i Jensen's inequality and the last bound we obtain 



m ^''n''K\\[E]jn\\1'' ^ C{k)r]'^^. Applying Markov's inequahty and the last bound with = 16 



max pf||[S]^||,> ^"j ^Cn-V' 



max w?"^ ^ Cn '^•q 



-8^64 



which proves the assertion (D.5). 

Proof of (D.6). Since Y^/ay — ■ ■ ■ /'^Y — 1 are independent and and centered random 



|2fc 



variables with E|Y^ /cjy — l| ^ C{k)r] it follows from Theorem 2.10 in Petrov [1995] 
that ^n-^J2^^^Y^/a^ - l| ^ C(A;)n-V''- 

Employing Markov's inequality and the last 
bound with k = 16 we deduce ^'(In-^ ELi ^iV^^y - 1| > 1/2) ^ Cn-^^r]^-^. Thereby, the 
assertion (C.IO) follows from the last bound by exploiting that {1/2 ^ S'y/o-y ^ 3/2}'^ C 
{|^~^ Er=i - 1| > 1/2}, which completes the proof. □ 

Lemma D.3. Let := a + ??2||rV2(^m _ p^^^ ^ 

^ 1. There exists a numerical constant C 

such that for all [n^/^J ^ m ^ 1 we have 



E(ll[r|i"=[w„yP-i24^ 



Proof. Let 1 ^ m ^ n be fixed and S"* := {z e M."^ : z^z ^ 1}. Define the subsets 
f„ := {e G M : |e| ^ nV6}, := {x G M : - /3"^,x)h| ^ ||rV2(/3 _ ^^)\\^n^l^}, 
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A'2„ := {x G H : [x]^[r]^i[a;]^ ^ mnV^} and := Xin^Xin- Given e G M, x G H and s G S"* 
we set 

vs{e,x) := {<Je + {13 - l3'^,x)u)s^[T]-;2'^[x\rnl{eeer.,xeXr.}^ 
Rs{e,x) := {ae + {(3- r,x)u)Ar]^/^[x]rn{l - l{e^e^,.^x„}). 

Leti/,* := {l/n)Yri=i{^s{ei,Xi)-¥.y,{ei,Xi)}sjidRl := (1/n) E"=i{i?.(£i, X,)}, 
then it is easily seen that || [r]m^''^[Wn]m|P = sup^ggm li^* + -R*p and hence 

Ef||[r]-V2[W^nkf -124^) ^2Ef sup 1^:^-64^) 

+ 2E sup =: 2{ri + Ta}, (D.7) 

where we bound the terms T\ and on the right hand side separately. 

Consider first T\ which we estimate by employing Talagrand's inequality. Obviously, we have 

sup sup |z/s(e,x)p = sup (o-e + (^-/3™,x)H)^[a:;]^[r]-^[x]^l{ee£:„,a:eA^„} 

^ (a + ||rV2(/3- _ p>)\\^fn^l^m ^ e^n^l^m =: (D.8) 
By employing the independence of e and X it is easily seen that 
rzE sup ^a''m + n{P-r,X)^\^[X]lS]^[X]rn, 



1 " 

sup - VVar(z/,(£i,Xi)) ^ + sup - T ,X)u\^\s'[T\;^l\X], 



|2 

, _ , . . . , , _ , ,. , . Jm| 



«=1 

By applying the Cauchy-Schwarz inequality together with E|| [rj^^^^fXj^,!!^ ^ vr?jf andE|(/3- 
r.X)^t ^ l|r'/'(/5"' - we obtain 

E sup ^ "^(a^ + ||rV2(^ _ /3-)||^^4) ^ ^2^"^ ^2^ p_9^ 

5£§m 77, 77, 

and taking in addition into account that E|s*[r]TO,^^^[X]j^|^ ^ for all s G S"* we obtain 

n 

sup - ^ Var(z.,(£i,X,)) ^ + ^V^l\r - fi)\W ^ 4 =■■ v- (D.IO) 
Combining (D.8), (D.9), (D.IO) due to Talagrand's inequality (Lemma D.l with £ = 1) follows 

+ 
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where we used that m ^ [n-*^/^] . 

Consider on the right hand side of (D.7). By employing [-^]m[r]^^[-'^^]m l{xeA'2„} ^ mv}/^ 
and Xn = n X2n we have 



nE sup ^E(a£ + (^-/3'",X)H)'[XUrLi[^k(l-l{ee£„,xeA'4) 

+ mnV3E(c7£ + - X)H)'(l{e0£„} + Mx^n})- 

Taking into account that E{ae + {/3 - /3"^,X)h)^ ^ i<T^ + ||rV2(^ _ )2^4^ Ee^ = 1 and 

E|(/3 - P"',X)m\'^ = ||rV2(^ _ /3m)||2 fj.^^ independence between e and X follows 

nE sup \Rtf ^ + ||rV2(/3-/3-)||2),^2/'^|[^]^^rj-i[^]^|2 J A ^ 

+ mnV3|a2^^2 ^^^^^^^ +||rV2(/3 - /3™)||^P(e £:„) 

We exploit now the estimates given in (D.l) and (D.2). Thereby, we obtain 

nE sup \Rlf ^ Cia^ + \\T^/\p - P'^)\\l)Tj^^mn-'^/^ ^ C^r?^^^-^ 

where we used that m ^ [n-*^/^]. Keeping in mind the decomposition (D.7) the last bound and 
(D.ll) imply together the claim of Lemma D.3 which completes the proof. □ 

Lemma D.4. There exists a constant K := K{a,r],J^f^,Q^) depending on a, rj and the classes 
Tl and only such that for all 1 we have 

sup sup Yl ||[r]-i/'[H^„kf-12a^ J ^Kn'^a^ + r)^n-\ 

Proof. We begin our proof with the observation that there exists an integer Ug no{a, r},J^, Q^) 
depending on cr, 77 and the classes Tl and only such that for all n'^ Ug and for all m ^ 
we have ^ ^ 2(a2 + ||rV2/3||2 +[^]t^[r]^Mff]m) = 2(4 + [5]^[r]-i[5]^) = a^. Indeed, we have 
= 0(1) as n — > 00 and \(;!^^ — a'^\ = o(l) as m — )■ 00 because ?m = o" + ri'^\\T^^'^{(3"^ — /3)||h 
and ||ri/2(^m „ ^)||2^ ^ 34(i'^r7„6~i due to (B.5) in Lemma B.l. We distinguish in the fol- 
lowing the cases n < Uq and n ^ Uq- First, consider n < Uq- Due to (D.3) in Lemma D.2 and 
^ 2((t2 + 35d^r) (Lemma B.2 (iv)) we have for all m ^ 1 

^(wV^'iWnW - 12a^^) ^ E||[r]-V2[w^„]^||2 ^ c%\a' + dV)- 
Hence, M+ ^ [n^/^] and mA^ ^ 6^ + ^ nC{d) for all 1 ^ m ^ M+ (Lemma B.2 (ii)) imply 

supsup Al,E(\\[r]-y'[WnU\'-12al'^) ^ n-'C{d) nl/' rj\a' + r). 
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The last bound implies the assertion of the lemma for all 1 ^ n < Uq because Up depends on 
a, 77 and the classes and only. Consider now n ^ Up where we have ^ cr^ for all 
m ^ m*. Thereby, we can apply Lemma D.3, which gives 



sup sup Y: A^E(||[r]-V2[H^„]^||2_i2^2^!!^] 

^Csupsup f ^{e^p(-^)+exp(-^)+^}. 

Taking into account the estimates Aj^ ^ ^ (1 + logd)~^A^, M+A^+ ^ 5^+ ^ 

nCd^ (1 + log d) and ^ cr^ ^ 2((T^ + 35(i^r) (Lemma B.2 (i), (ii) and (iv) respectively) follows 



sup sup ^ A^E ||[r]-V2[^^]^||2_i2^2^!!L^ ^C{d){a' + r)n 

/ A7 ( rnKL \^ , n}l\^n 

X sup sup < > A^ exp - — — , + n exp ( - — — ) + - 

/J^^reelLtrio V 6(1 + log im> 7 



>. 



Finally, exploit that S = S(^^) satisfies (3.3) and nexp ( — n^/^/lOO) ^ C which in turn implies 
the claim of the lemma for all n ^ Ug, i.e., 

supsup ^ A^E ||[r]-V2[VF„y|2_i2a^!^L^ ^C{d)rf\a'' + r)^n-\ 

Combining the cases n < and n ^ Uo completes the proof. □ 

Lemma D.5. There exist a numerical constant C and a constant C{d) only depending on d 
such that for all 1 we have 



sup sup [n\M+f max P{U%,^^) \ ^ Crf^- 
sup sup |nM+ max P(0^_,) | ^ C{d)7]^^; 



sup sup{n^P(f^)} ^ Cri' 



,64 



Proof of Lemma D.5. By employing Lemma D.2 rather than Lemma C.2 the proof of the 
lemma follows along the lines of the proof of Lemma C.5, and we omit the details. □ 



Proposition D.6. Let k = 288 in the definition of the penalty pen given in (2.11). There 
exists a constant K := K{a,r],Tl, ^7) depending on a, r] and the classes Tl and only such 



that for all 1 we have 



supsupEi sup (m-P''\\l-lpenk) \ ^ K rj""^ {a^ + r) E n-\ 
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Proof of Proposition D.6. Wc follow line by line the proof of Proposition C.6 . Keeping 
in mind that pen;, = 288cr^A;A|^A^ri~^ we obtain 



sup (wA-n-lpen,) U4 5^ AlE(m-'/'[W],f -Uaf-^) 



M+ Mi 



k=m% k=m^ 

The second and third right hand side term we bound due to Lemma D.2 and D.5, i.e., 
sup sup e\ sup (wPk - P'^Wl - ^penk) \ ^ C{d)rf^ {a^ + r)S n"^ 

+ 4 sup sup Y AiE(||[r]-V^[W^],||2-12a^^), 



='7 k=m^ 

and hence by employing the bound given in Lemma D.4 we complete the proof. □ 

Proposition D.7. Let k = 288 in the definition of pen and pen given in (2.11) and (2.6) 
respectively. There exists a constant C (d) such that for all n ^ 1 we have 

sup sup E(|| An - PWl l£c) ^ C{d) 776^ {a^ + r) E n-\ 

Proof of Proposition D.7. Taking into account (D.3) in Lemma D.2 rather than (C.7) in 
Lemma C.2 we follow line by line the proof of Proposition C.7 and conclude that 



The assertion follows now with help of Lemma D.5, which completes the proof. □ 



sup sup E(||^a - PWl Is-) ^ C{d){a^ + r)rfT.n^''^ sup sup \P{£^, 



Proof of Proposition 3.3. The assertion follows from Proposition D.6 and Proposition D.7 
and we omit the details. □ 
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